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ABSTRACT 

This  report  describes  a  wafer-scale  design  for  an  infrared  focal  plane  processor 
(FPP)  to  operate  in  a  space  environment.  The  functions  of  a  generic  focal  plane 
processor  are  described,  followed  by  a  detailed  discussion  of  a  design  to  be  imple¬ 
mented  in  RVLSI  wafer-scale  technology  for  a  space-based  application.  A  prototype 
of  this  processor  (PFPP)  will  actually  be  fabricated  in  rad-hard  silicon-on-insulator 
3-/rm  technology.  Finally,  the  question  of  reliability  is  explored,  and  a  philosophy 
of  fault- tolerance  is  presented  which  will  lead  to  a  reasonable  probability  of  success 
over  a  five-year  lifetime. 
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DESIGN  OF  A  WAFER-SCALE  FOCAL  PLANE  PROCESSOR 


1.  INTRODUCTION 

1.1  SCANNING  ARRAYS 

Consider  a  generic  scanning  infrared  sensor,  consisting  of  a  detector  array  with  n  rows  and  k 
time  delay  integration  (TDI)  columns.  (The  entire  arrangement  may  then  be  duplicated  for  each 
of  m  color  bands.  These  will  be  ignored  hereafter  for  the  sake  of  simplicity.)  One  can  imagine  this 
array  scanning  horizontally  across  an  image  in  order  to  form  a  two-dimensional  picture.  It  moves 
horizontally  by  one  column  every  dwell,  and  in  addition  is  oversampled,  typically  by  a  factor  of 
three,  so  that  the  entire  set  of  detectors  is  read  out  three  times  per  dwell.  Data  from  a  column 
with  TDI  position  k  must  be  delayed  k  -  1  of  these  dwells  before  being  added  to  subsequent  data 
from  the  same  row  in  order  to  perform  the  time  alignment  needed  for  integration. 

1.2  FOCAL  PLANE  PROCESSOR 

The  focal  plane  processor  (FPP),  also  known  as  a  time  dependent  processor,  is  responsible 
for  the  initial  signal  processing  of  data  from  an  array  of  photodetectors.  From  a  computational 
point  of  view,  the  initial  focal  plane  processing  is  characterized  by  two  salient  points:  (a)  the  input 
data  stream  is  massively  parallel:  each  detector  in  the  scanning  array  is  sampled  after  every  dwell 
time  and  is  treated  essentially  identically,  and  (b)  the  algorithms  applied  to  each  detector  sample 
are  relatively  simple  and  well- understood.  These  two  points  taken  together  favor  a  hardwired, 
single  instruction  multiple  data  (SIMD)  architecture  for  the  FPP.  This  architecture,  together  with 
the  requirements  of  low  power  consumption,  low  weight,  and  high  reliability  imposed  by  a  space 
environment,  makes  wafer  scale  integration  (WSI)  a  natural  choice  for  the  processor  technology. 
Nonetheless,  even  the  relatively  simple  processing  requirements  of  the  FPP  impose  a  higher  degree 
of  internal  differentiation  on  the  WSI  processor  (i.e.,  more  cell  types)  than  has  previously  been 
demonstrated.  Design  of  such  a  WSI  processor  is  a  nontrivial  task,  and  represents  the  subject  of 
this  report. 

The  functions  of  the  FPP  may  now  be  discussed  in  greater  detail.  The  incoming  data  must 
be  calibrated  to  correct  for  responsivity  differences  among  detectors,  and  samples  which  have 
been  corrupted  by  the  effects  of  7  radiation  need  to  be  recognized  and  discarded.  Following  that, 
two  other  signal  processing  functions,  time-delay  integration  and  matched  filtering  and  threshold 
detection,  must  be  performed.  At  this  point,  the  object  dependent  processor  (ODP),  whose  load 
depends  on  the  number  of  objects  Over  threshold,  takes  over.  These  four  major  functional  units  are 
described  briefly  in  the  order  in  which  the  data  pass  through  them. 

1.2.1  Calibration 

Each  pixel  in  the  detector  array  will  have  a  slightly  different  dark  current  and  responsivity, 
which  must  be  corrected.  If  this  function  has  not  been  implemented  in  the  analog  front  end,  it  is 
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handled  in  the  FPP  via  an  addition  and  multiplication.  In  principle,  nonlinear  responsivities  could 
also  be  calibrated  out.  This  is  rarely  done  in  practice  due  to  the  difficulty  of  finding  appropriate 
calibration  standards. 


1.2.2  Time  Alignment 

The  earlier  columns  of  the  scanning  array  must  be  delayed  before  being  added  to  later  columns. 
This  function,  which  would  be  performed  by  a  CCD  shift  register  in  analog  implementations,  is 
implemented  digitally  as  a  circular  buffer. 


1.2.3  Gamma  Circumvention 

The  detection  of  7-affected  data  is  very  much  like  a  CFAR  detector,  where  the  threshold  is 
set  to  a  certain  number  of  standard  deviations  beyond  the  mean.  A  current  estimate  of  the  mean 
and  standard  deviation  of  the  signal  is  obtained  using  various  semiheuristic  methods,  and  the 
ensemble  of  TDI  samples  corresponding  to  a  given  point  is  compared  with  a  threshold  based  on 
this  estimate.  Samples  above  this  threshold  are  assumed  to  be  contaminated  by  7-induced  electrons 
and  are  discarded.  The  remaining  samples  are  then  averaged  together  to  form  the  TDI  output. 


1.2.4  Matched  Filter  and  Detector 

The  output  of  time  alignment  is  then  run  through  an  FIR  filter  which  compensates  for  the 
combined  effect  of  oversampling  and  the  point  spread  function  of  the  optics.  In  the  simplest 
implementation,  the  detector  is  simply  a  comparator.  More  sophisticated  FPPs  may  incorporate 
more  complicated  circuitry,  e.g.,  Laplacian  filters  to  remove  nuclear  background  effects. 


1.3  FAULT  TOLERANCE 

The  goal  of  a  five-year  mission  lifetime,  combined  with  the  expected  reliability  of  wafer-scale 
circuits,  imposes  a  fault-tolerant  structure  on  the  design.  The  approach  taken  here  is  to  have 
redundant  circuit  elements  which  may  be  switched  in  as  needed  via  multiplexors.  There  is  a  design 
tradeoff  to  be  made  on  the  size  of  these  fault-tolerant  elements  -  too  small,  and  the  switching 
circuitry  becomes  cumbersome;  too  large,  and  the  probability  of  and  penalty  for  failure  both  become 
excessive. 

As  will  be  seen  below,  this  tradeoff  was  one  of  the  factors  influencing  the  choice  of  lower- 
capability  serial  arithmetic  processors,  rather  than  higher-capability  parallel  ones.  The  fault  toler¬ 
ant  unit  was  then  chosen  to  be  a  complete  processing  element  (PE),  comprising  all  four  functional 
units. 
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1.4  WAFER  SCALE  INTEGRATION 


Design  of  an  FPP  to  be  realized  in  wafer  scale  technology  must  take  into  account  the  require¬ 
ments  of  this  technology.  Chiefly,  this  means  that  it  must  be  possible  to  lay  out  the  processor  on 
a  wafer,  and  that  the  processor  must  be  manufacturable  with  a  reasonable  yield. 


1.4.1  Serial  versus  Parallel  Arithmetic 

The  layout  problem  became  evident  early  in  the  consideration  of  a  parallel  processor.  Since  the 
processor  was  designed  for  12-bit  arithmetic,  utilizing  a  35-/rm  wire  pitch  resulted  in  each  bus  being 
0.4  mm  wide.  The  combination  of  a  fault-tolerant  architecture  and  the  requirement  for  processing 
parallel  TDI  stages  leads  naturally  to  a  design  in  which  several  buses  lie  side  by  side.  The  resulting 
“Los  Angeles  effect”  produces  a  wafer  in  which  buses  are  a  significant  fraction  of  the  total  area  (see 
Section  3.2.)  This  fact  led  to  the  consideration  of  nibble-wide  buses.  One-bit  nibbles  were  rapidly 
realized  to  be  most  appropriate,  at  least  in  the  near  term. 


1.4.2  Defect  Tolerance 

Any  process  will  have  a  small  number  of  manufacturing  defects.  A  circuit  containing  as  many 
elements  as  a  wafer-scale  processor  will  have  a  yield  approaching  zero  unless  a  way  is  found  to 
correct  the  defects  after  manufacture.  In  the  restructurable  VLSI  processes,  redundant  elements 
called  restructurable  cells  are  laid  down.  These  are  then  connected  together  after  testing[9].  Hence, 
any  design  must  include  identification  of  suitable  restructurable  cells.  These  cells  must  be  relatively 
small  (<  15,000  transistors)  so  that  their  yield  is  good,  yet  be  common  and  few  in  type  to  simplify 
design  and  mask  production.  Ideally,  they  should  bear  some  simple  relationship  to  the  functions 
of  the  processor.  All  these  goals  are  furthered  by  an  architecture  based  on  a  multitude  of  low- 
capability  serial  elements,  rather  than  a  few  higher-capability  parallel  ones.  In  particular,  we  find 
that  the  restructurable  cells  can  be  just  the  four  functional  units  discussed  in  Section  1.2.* 


1.5  ORGANIZATION  OF  REPORT 

This  report  is  divided  into  five  sections.  The  present  section  introduces  an  FPP  and  its 
functions  to  those  unfamiliar  with  one,  and  to  identifies  the  principal  issues  that  drive  the  design. 
Section  2  begins  by  presenting  a  set  of  strawman  requirements  for  a  space-based  IR  sensor.  These 

"  Late  in  the  design  of  the  wafer,  the  gamma  circumvention  circuit  was  in  fact  split  into  two 
smaller  parts,  one  for  TDI  summation  (left  side  of  Figure  2-5)  and  one  for  gamma  threshold 
generation  (right  side).  The  change  was  made  for  producibility  reasons;  the  full  gamma  cell  would 
otherwise  have  been  40  percent  larger  than  the  next  largest  cell  in  the  PFPP.  For  the  purposes  of 
this  report,  however,  the  two  cells  will  be  considered  as  one. 
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requirements  motivate  the  design  of  the  major  functional  units  of  a  prototype  wafer-scale  FPP, 
which  are  described  in  some  detail  in  the  remainder  of  the  section. 

Sections  3  and  4  give  a  closer  look  at  some  of  the  critical  design  methodology.  Section  3 
describes  in  more  detail  the  area  calculations  which  illuminated  the  principal  problem  in  the  initial 
design  of  the  WSI  prototype  FPP:  getting  enough  processors  on  the  wafer  to  ensure  a  reasonable 
probability  of  success.  Success  in  this  sense  must  embrace  both  initial  yield  (defect  tolerance)  and 
reliability  in  use  (fault  tolerance).  The  solution  to  this  problem  is  the  use  of  bit-serial  arithmetic. 
Section  4  describes  the  part-stress-analysis  approach  [4]  used  to  estimate  the  reliability  of  the  PFPP 
and  its  subunits.  Section  4  also  presents  a  bottom-up  calculation  and  rationale  for  the  reliability 
parameters  chosen.  A  redundant  (M-of-N)  processing  element  architecture  is  employed  to  achieve 
acceptable  mission  life  given  the  expected  subunit  reliability. 

Following  the  report  conclusion,  two  appendices  present  more  in-depth  treatments  of  roundoff 
errors  in  TDI  summing,  and  an  alternate  approach  to  infrared  detection  in  the  presence  of  gamma 
radiation. 
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2.  DESIGN  OF  A  PROTOTYPE  FOCAL  PLANE  PROCESSOR 


2.1  WAFER-LEVEL  DESCRIPTION 

The  parameters  for  this  design  were  based  on  the  conclusions  of  a  number  of  classified  studies 
reflecting  the  projected  requirements  of  the  Space  Surveillance  and  Tracking  System  (SSTS).  The 
strawman  sensor  point  design  calls  for  a  sensor  of  20,000  rows  and  5  TDI  columns  in  each  of  4  color 
bands.  The  detector  array  moves  horizontally  by  1  column  every  28  /xs  and  is  oversampled  by  a 
factor  of  4  in  time,  so  that  the  entire  set  of  detectors  is  read  out  every  7  /xs.  The  dwell  time  used 
in  the  TDI  process  is  28  /xs.’  A  wafer  scale  (or  any  other)  processor  is  unrealizable  for  this  data 
rate  (4  •  105detectors  x  1.4  •  105Hz  =  5.6  •  1010  samples/s)  in  current  technology,  although  one  will 
eventually  be  feasible  using  one  micron  or  smaller  geometry  and  large  wafers. 

Instead,  a  prototype  FPP  (PFPP)  was  designed  around  a  downsized  scanning  infrared  sensor, 
shown  in  Figure  2-1.  This  sensor  consists  of  a  monochrome  detector  array  with  only  64  rows. 
However,  the  number  of  TDI  columns  and  readout  rate  was  retained  from  the  strawman  sensor, 
so  that  the  PFPP  maintains  the  essential  design  parameters  of  the  complete  sensor,  but  with  1250 
times  fewer  processing  elements.  These  PEs  could  then  be  proliferated  on  6-inch  wafers  with  l-/xm 
geometry,  but  need  not  be  redesigned  to  accommodate  the  full  strawman  sensor  point  design. 

The  following  list  is  a  summary  of  the  PFPP  design,  based  on  the  above  sensor  description 
and  assuming  3-inch  SOI  wafers  with  3-/xm  design  rules. 

(1)  The  processing  of  the  64  detector  rows  will  be  performed  with  a  system  using  2 
wafers,  which  will  contain  5  processor  elements  (PEs)  -  4  working  and  1  spare. 

(2)  Each  processor  element  processes  data  from  8  consecutive  rows  of  the  detector 
array. 

(3)  Input  data  are  assumed  to  be  12  bits  long.  This  wordlength  permits  a  mean 
background  that  is  two  orders  of  magnitude  greater  than  the  target  signal. 
Three-percent  precision  (5  bits)  is  then  possible  on  a  signal  that  is  one  percent 
(7  bits)  of  the  mean  [1], 

(4)  Eight  detector  rows  are  processed  in  the  7  fis  sampling  time  requiring  7  /xs/8  = 

875  ns  per  detector.  In  order  to  preserve  full  12-bit  accuracy  throughout,  the 

*  Since  the  time  of  these  SSTS  studies,  the  space  surveillance  community  has  moved  toward  less 
aggressive  sensor  designs  emphasizing  near-to-intermediate  term  producibility.  Typical  integration 
times  have  become  an  order  of  magnitude  or  more  longer  and  the  number  of  detector  elements 
has  decreased,  although  the  number  of  TDI  stages  have  gone  up  somewhat.  The  design  for  this 
prototype  processor,  however,  was  frozen  before  these  changes  became  effective.  The  principal 
effect  of  implementing  the  changes  would  be  to  make  the  FPP  much  more  memory  intensive,  by 
increasing  the  size  of  calibration  memories  and  delay  buffers  while  reducing  the  number  of  PEs. 
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Figure  2-1.  Downsized  scanning  array. 

12-bit  input  data  stream  will  be  padded  with  2  bits  of  leading  zeros,  providing 
for  word  growth  in  intermediate  stages  of  processing.  It  will  then  be  processed 
bit-serially.  Thus,  the  processor  clock  will  run  at  875  ns/14  =  62.57  ns/bit  (16 
MHz). 

(5)  Fault-tolerance  is  obtained  by  connecting  the  processor  elements  to  the  input 
and  output  buses  through  multiplexors,  allowing  any  2  of  the  10  PEs  to  fail 
without  loss  of  functionality. 

(6)  Defect-tolerance  is  obtained  by  laying  down  a  large  number  of  PEs  and  piecing 
together  good  ones  at  restructuring  time.  Current  area  estimates  indicate  that 
18  complete  PEs  could  be  laid  down  on  a  single  wafer,  however,  fewer  actually 
will  be  (see  Section  3.3). 

Note  that  only  5  of  the  possible  18  PEs  per  wafer  are  required  to  restructure  the  proposed 
system.  Additional  bussing  and  pinouts  will  be  provided  so  that  if  yields  are  better  than  this  initial 
conservative  design  goal  requires,  the  wafer  can  be  configured  to  handle  a  larger  number  of  inputs. 


6 


103M1-1 


The  design  still  calls  for  a  2-wafer  set  in  order  to  exercise  the  multiple  wafer  design  concept  which 
will  eventually  be  required. 

Figure  2-2  represents  a  quasi-geographical  schematic  layout  of  one  wafer  from  a  two-waf«  - 
processor  set,  with  the  lowest  level  of  detail  being  four  units  -  input  mux/calibration/time  align¬ 
ment;  gamma  circumvention/TDI  summation;  matched  filter/detector;  and  output  mux.  The 
fault-tolerant  data  busing  is  shown  in  detail  on  this  figure,  although  the  rest  of  the  busing  (e.g., 
off-wafer  calibration,  control  logic,  etc.)  is  not.  This  is  to  make  the  sparing  strategy  explicit,  as 
well  as  show  some  of  the  complexity  of  the  interconnect.  Note  that  since  the  architecture  is  serial, 
all  buses  are  only  one  bit  wide. 

2.2  FAULT-TOLERANCE  INPUT 

Figure  2-3  shows  the  input  and  calibration  cell.  Each  input  subunit  is  connected  by  a  4:1 
multiplexor  to  any  of  3  consecutive  input  signals  (except  for  PEs  on  the  ends  of  the  chain)  or  a 
test  pattern  input.  This  arrangement  permits  any  2  PEs  to  fail  at  runtime,  and  to  be  replaced  by 
their  neighbors.  Referring  back  to  Figure  2-2,  the  4  initially  active  PEs  on  the  wafer  are  shown 
labeled  A-D,  corresponding  to  the  array  segments  to  which  they  are  assigned.  The  second  wafer 
(not  shown),  will  have  an  identical  set  labeled  E-H.  The  input  subunits  are  also  subscripted  with 
the  TDI  stage  to  which  they  belong.  Two  spare  PEs  are  provided,  one  at  each  end  of  the  processor 
element  chain,  labeled  X  (shown  in  Figure  2-2)  and  Y  (on  the  other  wafer). 

Figure  2-2  also  shows  output  muxes  for  the  PEs.  This  feature  would  make  the  sparing  strategy 
transparent  off-wafer;  each  output  pin  would  always  contain  signals  from  the  same  input  pixels.  In 
the  interest  of  simplicity,  however,  the  output  mux  will  not  be  implemented  in  the  PFPP.  Pins  for 
every  PE  are  present  and  the  ODP  will  have  to  keep  track  of  which  are  active. 

In  this  design,  the  fault- tolerant  atom  is  the  whole  PE.  This  approach,  which  simplifies  the 
design  concept,  is  made  possible  by  the  use  of  small  low- capability  serial  processors.  A  parallel 
processor  running  at  a  similar  clock  rate,  e.g.,  serving  96  rows  instead  of  8,  would  be  too  large  to 
discard  lightly. 

2.3  PIECEWISE  LINEAR  CALIBRATION 

Figure  2-3  shows  the  calibration  circuit.  The  input  data  are  processed  by  a  piecewise  linear 
approximation  to  a  function  which  corrects  for  nonlinearity  and  nonuniformity  in  the  detectors. 
There  is  a  separate  set  of  calibration  coefficients  for  each  of  the  8  detectors  assigned  to  a  single  PE. 
Each  of  these  calibration  functions  has  4  linear  segments.  The  appropriate  slope  and  offset  for  the 
piecewise  linear  function  are  selected  by  addressing  the  coefficient  memory  with  a  combination  of 
the  2  MSBs  of  the  input  data  to  indicate  which  of  the  4  linear  segments  to  use,  and  a  counter  to 
indicate  which  detector  is  being  corrected. 

The  slope  coefficients  are  stored  with  10-bit  accuracy.  This  length  is  sufficient  to  maintain  the 
input  accuracy,  since  each  coefficient  is  applied  over  \  full  range.  Since  the  offset  is  12  bits,  each 
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Figure  2-2.  Schematic  wafer  layout. 
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Figure  2-3.  Input  a nd  calibration  cell. 
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calibration  memory  contains  8  X  4  X  22  =  704  bits,  which  is  quite  modest.  Memories  that  small 
tend  to  be  dominated  by  their  address  decoding  logic;  an  increase  in  integration  time  on  the  part 
of  the  focal  plane  would  permit  more  rows  to  be  handled  by  each  PE  and  a  concomitant  increase 
in  storage  efficiency. 

Although  the  circuit  is  designed  to  implement  a  4-segment  linear  correction,  several  other 
functions  are  possible  using  the  same  circuit,  but  with  different  data  in  the  memory,  notably  a 
simple  gain  and  offset  calibration.  At  present,  IR  systems  typically  use  either  single  point  (offset) 
or  2-point  (gain  and  offset)  calibration,  due  to  the  difficulty  of  finding  appropriate  calibration 
standards  in  the  infrared.  This  situation  is  unlikely  to  change  in  the  near  term;  the  requirement 
of  12-bit  accuracy  thus  translates  into  a  rather  daunting  requirement  on  the  photodiode  array  of 
linearity  better  than  1  part  in  4096. 

The  SETUP  logic  on  the  right  of  Figure  2-3  controls  the  downloading  of  the  calibration  co¬ 
efficients  from  off-wafer.  Note  that  aside  from  the  overflow  protection,  no  attempt  is  made  in  the 
on-wafer  logic  to  impose  any  reasonableness  criteria  on  the  coefficients  (e.g.,  continuity  at  segment 
boundaries).  This  is  the  responsibility  of  the  off-wafer  calibration  algorithm. 

Data  representation  throughout  the  processor  is  positive  only.  This  convention  does  not  result 
in  any  loss  of  generality.  The  detector  element  with  the  highest  dark  current  will  have  an  offset  of 
zero  in  an  all  positive  scheme.  Other  elements  will  have  pedestals  added  to  match  it.  The  pedestal 
may  then  be  compensated  out  at  the  output  threshold.  Note,  however,  that  “hotter”  (higher  dark 
current)  pixels  still  effectively  compress  the  available  dynamic  range  of  the  processor.  Allocating  1 
of  the  12  bits  to  a  sign  cuts  the  range  by  a  factor  of  2,  but  with  a  detector  uniform  to  ±5  percent, 
the  largest  pedestal  is  410  out  of  4096,  giving  the  edge  to  the  all-positive  approach. 


2.4  TIME  ALIGNMENT 

The  time  delay  and  integration  process  requires  that  earlier  columns  be  delayed  so  that  they 
can  be  processed  along  with  later  ones.  In  the  strawman  design  under  consideration,  the  unit  TDI 
delay  is  28  fxs.  Since  the  last  stage  need  not  be  delayed,  time  alignment  consists  of  delaying  each 
set  of  32  detector  inputs  by  0,  28,  56,  84,  or  112  ns,  respectively.  (Recall  that  there  are  8  detector 
rows  per  PE  and  each  dwell  is  oversampled  by  a  factor  of  4.)  Logically,  the  delay  stages  may  be 
thought  of  as  delay  lines.  However,  implementing  delay  lines  in  CMOS  is  undesirable  because  of 
the  large  switching  currents.  Instead,  the  delays  are  implemented  as  circular  buffers,  in  which  only 
the  address  pointers  are  incremented  while  the  data  remain  in  place.  A  single  delay  stage,  which 
is  a  restructurable  cell  in  the  design,  is  shown  in  Figure  2-4.  Incoming  data  arrive  in  bit-serial 
format,  are  converted  to  parallel  with  a  serial-in-parallel-out  (SIPO)  converter  and  are  stored  in  a 
32  X  12  static  RAM.  The  read  and  write  addresses  for  this  memory  are  controlled  by  a  counter. 
Since  the  delay  is  32  words,  word  n  +  32  always  overwrites  the  location  that  word  n  was  just 
read  from.  Multiple  delays  are  implemented  by  daisy  chaining  this  32-word  delay  cell.  The  small 
capacity  memory  cell  is  not  area-efficient  by  itself,  but  the  efficiency  of  not  constructing  4  different 
size  memories  more  then  compensates. 
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Figure  2-4.  Time  delay  cell. 

2.5  GAMMA  CIRCUMVENTION  AND  TDI  SUMMATION 

The  purpose  of  this  cell  is  twofold:  reject  detector  element  signals  which  have  been  contami¬ 
nated  by  7  events  and  then  average  the  remaining  TDI  elements  together.  Before  turning  to  the 
implementation  on  the  PFPP,  we  will  give  a  brief  introduction  to  gamma  circumvention  (in  order 
to  motivate  it)  and  an  alternative  approach. 

What  is  being  circumvented  in  gamma  circumvention  is  noise  produced  not  directly  by  7s,  but 
by  electrons  produced  by  the  interaction  of  7  radiation  with  matter  in  the  vicinity  of  the  detector 
array.  The  interaction  of  7  radiation  with  matter  takes  place  through  three  main  mechanisms: 

(1)  Photoelectric  effect 

(2)  Scattering  on  free  electrons 

(3)  Pair  production 

At  the  energies  associated  with  nuclear-produced  radiation,  items  (1)  and  (2)  are  the  dominant 
mechanisms.  (See,  for  example,  [2]  section  2-9  for  a  discussion  of  the  physics.)  The  resultant 
electrons  are  charge  carriers  which  produce  effects  in  the  detector  similar  to  those  produced  by 
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IR-photon-induced  carriers.  They  produce  an  energy  spectrum  with  a  long  exponential  falloff 
(“Landau  tail”)  characteristic  of  the  passage  of  ionizing  radiation  through  matter. 

2.5.1  Algorithm 

The  algorithm  chosen  is  a  variant  of  the  Spike  Adaptive  (SATDI)  type[15,16].  Many  variants 
of  SATDI  exist,  but  all  rely  on  the  basic  idea  that  detector  response  within  a  TDI  set  should  be  the 
same  within  some  noise  variation.  Any  sample  outside  some  statistically  determined  limit  is  then 
assumed  to  be  contaminated  with  a  “7”  pulse,  and  is  eliminated.  A  common  approach  (assuming 
unipolar  spikes)  is  to  use  a  lowest-of-N  algorithm,  in  which  the  lowest  TDI  sample  is  considered  to 
be  the  one  most  likely  free  of  contamination.  This  algorithm  is  easy  to  implement  in  digital  logic. 
Because  of  its  theoretical  attractiveness,  however,  the  approach  taken  here  is  to  model  the  data  as 
a  Poisson  random  variable  with  mean  A  and  standard  deviation  \/A.  The  estimated  parameter  A 
is  formed  by  summing  the  5  TDI  samples  and  scaling  by  The  threshold  is  then  formed  as 

X  +  ky/l 

where  k  is  the  number  of  standard  deviations  used.*  A  TDI  sample  which  exceeds  the  threshold  is 
considered  contaminated  and  excluded. 

2.5.2  Alternate  Approach 

The  thrust  of  all  SATDI  approaches  is  to  consider  the  7-contaminated  samples  to  be  bad  data, 
eliminate  them,  and  proceed  with  processing  on  the  remaining  data.  The  SATDI  approach  has  two 
disadvantages: 

(1)  The  signal-to-noise  ratio  is  degraded,  for  the  discarded  samples  no  longer  con¬ 
tribute  to  the  y/~N  SNR  gain. 

(2)  The  output  becomes  biased,  as  the  7  threshold  eliminates  samples  with  large 
positive  random  variation. 

An  alternate  approach  is  to  perform  a  maximum  likelihood  detection  algorithm  on  the  signal  in 
the  presence  of  7  noise.  This  approach  is  feasible  if  a  parametric  form  of  the  7  noise  is  assumed, 
and  is  explored  in  more  detail  in  Appendix  A. 

2.5.3  Threshold  Generation 

Figure  2-5  shows  the  schematic  for  the  gamma  circumvention  and  TDI  summation  cell.  It  is 

t  Note  that  Poisson  statistics  apply  in  this  form  only  to  the  raw  photodetection  process.  If 
the  detector  output  has  been  scaled  down  by  some  factor  s  at  the  input  stage,  then  the  standard 
deviation  becomes  y/7\  =  y/sy/X.  Thus,  the  threshold  factor  k  must  effectively  be  rescaled  by  y/s. 
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broken  into  two  logical  sections:  the  upper  part  generates  the  SATDI  threshold,  while  the  lower 
part  compares  each  TDI  sample  with  the  threshold  and  averages  the  accepted  samples. 

Threshold  generation  in  SATDI  is  a  reasonably  heuristic  affair;  consequently  there  is  no  re¬ 
quirement  of  extreme  precision  in  the  threshold  generation  circuit.  The  |  circuit  is  approximated 
by 


(1)  Summing  the  5  inputs 

(2)  Rounding  and  shifting  right  2  bits,  leaving  a  13  significant  digit  sum 

(3)  Multiplying  the  sum  by  approximated  to  6  bits  as  0.1 1001 12  =  0.796875io 
for  a  0.4  percent  error. 

See  Appendix  B  for  a  further  discussion  of  this  approach. 

To  save  space,  the  square  root  is  calculated  using  a  256  X  6  ROM.  As  shown  in  Figure  2-5, 
the  method  is  a  2-range  lookup  table.  The  12-bit  input  data  are  shifted  left  4  bits  if  the  data  item 
is  less  than  256,  and  the  resulting  8  MSBs  are  then  used  to  address  the  table.  This  shift  maps 
the  ranges  0  to  255  and  256  to  4095  into  a  single  256  element  table.  The  output  of  the  table  is 
compensated  by  shifting  right  2  places  if  the  input  is  shifted  left.  The  dual-range  lookup  yields  a 
maximum  difference  of  1  from  a  true  integerized  square  root  over  the  range  0  to  4095. 

The  output  of  the  square  root  table,  which  represents  an  estimate  of  the  standard  error,  is 
then  multiplied  by  a  5-bit  7  constant  and  the  product  is  added  back  into  the  delayed  average  to 
form  the  SATDI  threshold.  The  multiplier  is  arranged  so  that  the  output  is  scaled  by  |.  Thus,  the 
7  constant  is  effectively  in  the  form  xx.xxx,  allowing  a  range  of  0  to  3.875  in  steps  of  0.125. 


2.5.4  Comparator  and  TDI  Summation 

The  output  of  the  SATDI  generation  circuit  is  fanned  out  and  compared  with  the  delayed 
TDI  set  in  parallel.  Those  elements  which  are  under  threshold  are  passed  through  to  a  summer. 
The  output  of  the  comparator  is  also  passed  to  a  circuit  which  generates  a  multiplier  for  scaling 
the  summer  output.  The  multiplier  is  4/N  rather  than  1/N  because  the  summer  has  prerounded 
and  right-shifted  the  sum  bits  by  2,  in  order  to  guarantee  that  the  maximum  number  of  nonzero 
bits  is  13.  Proceeding  in  this  manner,  which  is  advantageous  from  a  hardware  point  of  view,  can 
cause  an  error  in  the  least  significant  bit.  The  effect  is  not  significant  except  when  the  signal  and 
background  are  both  small.  Appendix  B  contains  a  more  detailed  discussion. 

2.6  MATCHED  FILTER  AND  DETECTOR 

The  matched  filter  is  a  separable  4x4  digital  filter.  Being  separable  means  that  the  filter 
may  be  constructed  as  the  convolution  of  a  4-tap  filter  oriented  vertically  with  another  4-tap  filter 
oriented  horizontally,  resulting  in  a  savings  in  the  amount  of  computation  required.  The  4-tap 
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horizontal  filter  is  matched  to  the  4x  in-scan  oversampling.  The  4-tap  vertical  filter  assumes  that 
the  cross-scan  resolution  is  made  comparable  to  the  in-scan  resolution  by  using  rectangular  pixels. 

The  matched  filter  and  detector  cell  is  shown  in  Figure  2-6.  A  far  more  detailed  hardware 
description  is  given  in  [3].  Much  of  the  complication  of  the  interconnect  stems  from  the  fact  that 
the  cross-scan  filter  requires  data  from  adjacent  detector  rows  and  hence  adjacent  PEs.  The  most 
naive  design  would  require  data  from  both  nearest  neighbors;  the  current  implementation  offsets 
the  cross-scan  filter  so  that  a  PE  only  requires  data  from  the  previous  PE  (Figure  2-7).  The  output 
of  the  PE  can  then  be  shifted  up  to  compensate  for  this  offset  (ignoring  edge  effects). 

The  8x12  delays  are  implemented  as  true  shift  registers  in  this  design,  resulting  in  a  significant 
current  draw.  A  more  capable  PE  would  require  longer  delays  and  these  could  also  be  implemented 
as  circular  buffers  like  the  time  alignment  memories. 

A  simple  threshold  detector  is  attached  to  the  output  of  the  horizontal  convolution.  The 
threshold  is  loaded  from  off-wafer,  so  that  it  can  be  adjusted  during  operation  of  the  PFPP  as  a 
means  of  controlling  the  overall  false  alarm  rate.  The  comparator  is  implemented  as  a  combinatorial 
full  adder,  and  the  full  signed  difference  is  sent  off-wafer  to  the  ODP. 


15 


CIRCUMVENTION 


16 


Figure  2-6.  Matched  filter  and  detection  cell. 
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Figure  2-7.  FIR  filters. 
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3.  IMPLEMENTATION  ISSUES 


3.1  AREA  ESTIMATES 

In  the  course  of  the  conceptual  design  of  the  wafer,  area  estimates  were  done  using  relatively 
crude  estimates  of  the  circuit  elements  needed  in  the  design.  The  area  required  for  certain  circuit 
components  was  estimated  as  shown  in  Table  3-1.  The  estimates  for  static  RAM  and  shift  registers 

TABLE  3-1. 

Component  Area  Estimates 


Component 

Area  (mm2) 

Static  RAM,  per  bit 

0.0063 

Shift  register  (static),  per  bit 

0.0081 

Serial  multiplier,  per  bit 

0.1200 

Serial  adder,  per  bit 

0.0580 

Tristate  register,  per  bit 

0.0225 

were  from  designs  being  developed  by  Group  23  at  Lincoln  Laboratory.  The  memory  figure  assumed 
a  cell  size  of  40A  x  50A,  and  amortized  the  area  required  for  read/write  and  address  select  over 
the  per-bit  figure.  This  area  is  nonnegligible  for  small  memories  as  are  used  in  this  design.  The 
remaining  estimates  were  from  MOSIS  scalable  designs  with  A  =  1.5  (for  3-/rm  technology). 


3.2  SERIAL  VERSUS  PARALLEL  ARITHMETIC 

Using  these  figures,  area  estimates  for  both  a  serial  and  a  parallel  arithmetic  processor  were 
developed.  For  equal  clock  rates,  the  parallel  processor  will  have  12  times  the  capability  of  the  serial 
one.*  One  might  naively  expect  each  serial  processor  to  be  ^  of  an  equivalent  parallel  processor 
in  area  (see  [12],  p.20).  This  is  not  the  case  for  a  number  of  reasons: 

(1)  Extra  accumulator  registers  have  to  be  provided  for  the  multipliers. 

(2)  Extra  shift  registers  have  to  be  provided  for  increased  latency  at  choke  points 
where  all  bits  are  required  (e.g.,  calibration,  gamma  circumvention). 

*  For  ease  in  supplying  input  data,  the  prototype  parallel  processor  was  sized  for  only  16  detector 
rows  rather  than  96;  it  was  designed  to  run  in  burst  mode,  with  a  low  duty  cycle. 
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(3)  Calibration  and  TDI  delay  memories  become  less  dense  as  their  size  is  reduced 
to  serve  fewer  detectors. 

(4)  Since  serial  adders  are  pipelined,  intermediate  word  growth  that  appears  when 
summation  is  followed  by  division,  e.g.,  TDI  summation,  has  to  be  accommo¬ 
dated  by  padding  out  the  bit  stream  with  extra  zeros. 

Figures  3-1  and  3-2  are  graphical  representations  of  the  area  estimates  for  the  PFPP  using 
parallel  and  serial  arithmetic.  Inset  into  them,  in  turn,  are  Tables  3-2  and  3-3  which  present  the 
data  numerically  and  serve  as  the  figure  keys.  In  the  figures,  space  allocated  to  a  PE  is  represented 
by  the  horizontal  chaindash  bars.  Within  the  bars,  shaded  boxes  represent  area  allocated  to  circuit 
elements;  white  space  around  the  boxes  is  reserved  for  interconnect.  The  crosshatched  areas  at  the 
left  and  right  are  input  and  output  buses. 

The  figures  graphically  illustrate  the  smaller  granularity  of  the  bit-serial  architecture.  Due 
to  the  significantly  larger  size  of  a  parallel-arithmetic  PE,  and  the  width  of  the  buses,  only  6 
(optimistically)  would  fit  on  a  3-inch  wafer.  The  sparing  strategy  was  to  have  3  working  PEs  (2 
active  and  1  spare)  per  wafer.  Eighteen  serial  processors  were  calculated  to  fit  in  the  50-mm  square 
contained  within  a  3-inch  wafer.  Since  only  5  are  needed  in  the  prototype  system,  this  approach 
would  permit  a  working  processor  even  if  early  yields  in  the  SOI  process  were  relatively  low. 
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TABLE  3-3 

Area  Allocation  for  a  PE  -  Bit  Serial 


Cell 

Box 

Contents 

Area 

(mm1) 

Horizontal 

Belt 

(bus) 

Vertical 

Belt 

(bus) 

A 

5  IMUX 

■Bl 

5 

5 

Input/ 

B 

5  CRAM 

5 

5 

Calibration 

C 

5  MULADD 

5 

5 

Delay 

D 

10  RAM32 

35.5 

5 

5 

Gamma/TDI 

E 

2  SUM5  ;  1  TDI5 

1.0 

5 

5 

summation 

F 

2  1/N  ;  1  MULADD  ;  1  RADIC 

14.4 

2 

2 

G 

1  REG-SW 

2.6 

2 

2 

H 

4  TWOBYTWL  ;  1  SUM5 

3.5 

2 

2 

Matched 

1 

3  SHIFTR 

1.6 

2 

0 

Filter/ 

/ 

H 

4  TWOBYTWL  ;  1  SUM5 

3.5 

2 

2 

Detector 

K 

1  THRESH 

2.2 

2 

2 

TOTAL 

109.1 

Pre-TDI  5-wide  bus  (5  wires)  =  0.18  mm  at  35-nm  pitch 
Post-TDI  2-wide  bus  (2  wires)  =  0.07  mm  at  35-pm  pitch 


50  mm 


Figure  3-2.  Schematic  area  allocation  on  PFPP  wafer  -  bit  serial. 
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3.3  PREDICTED  VERSUS  ACTUAL  AREA 


Another  interesting  comparison  can  be  made  between  the  estimates  given  in  Table  3-3,  which 
were  generated  roughly  a  year  before  the  present  report  was  written,  and  the  actual  area  taken  by 
the  cells.  As  of  this  writing,  two  of  the  four  cells  (TDI  delay  and  matched  filter/detector)  have 
been  designed  by  Group  23  and  received  back  from  MOSIS.  A  third  (input  and  calibration)  is  in 
final  layout  and  its  size  can  be  estimated  with  confidence.  The  fourth  (gamma  circumvention/TDI 
summation)  is  well  along,  and  its  area  can  be  estimated  with  reasonable  accuracy.  This  comparison 
is  made  in  Table  3-4. 


TABLE  3-4. 

Estimated  versus  Actual  Cell  Area 


Cell 

Estimated 

(mm2) 

Actual 

(mm2) 

Difference 

(%) 

Input/calibration 

9.0 

~10 

~11 

TDI  delay 

3.5 

3.4 

-3 

Gamma/TDI  summation 

15.4 

—17.5 

~14 

Matched  filter/detector 

13.4 

10.0 

-25 

Agreement  between  the  rough  calculations  and  as  laid-out  areas  is  remarkably  good.  At  the 
time  of  the  initial  calculation  (16  June  1987),  Group  23  had  a  reasonable  idea  of  what  its  small 
static  RAM  would  look  like;  hence  the  input/calibration  and  delay  estimates  are  much  closer  than 
the  other  two  cells.  Early  memory  estimates,  based  on  large  commercial  RAMS,  had  tended  to  be 
much  more  optimistic.  Since  the  errors  on  gamma/TDI  summation  and  matched  filter/detector 
roughly  cancel,  it  is  reasonably  certain  that  the  goal  of  laying  down  18  PEs  on  a  50-mm  square 
could  be  met. 

Early  results  with  wafer  scale  circuits  implemented  in  the  Lincoln  Laboratory  zone  melt  refined 
(ZMR)  SOI  technology  (see  [10]  for  a  review),  however,  indicate  that  defects  in  ZMR  wafers  tend 
to  occur  preferentially  at  the  edge  of  the  wafer.  Therefore,  the  preliminary  FPP  is  being  designed 
to  fit  into  a  40-mm  square,  allowing  a  5-mm  buffer  on  all  edges:  (f§)2  x  18  =  11.5;  because  of 
inefficiencies  in  packing,  probably  about  10  PEs  will  fit  in  this  smaller  area. 
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4.  RELIABILITY  ESTIMATES 


One  of  the  most  stressing  demands  on  a  focal  plane  processor  is  the  requirement  of  reliability 
in  a  space  environment.  The  FPP  is  designed  for  a  nominal  five-year  lifetime.  Modern  integrated 
circuit  design  results  in  highly  reliable  circuits.  The  extremely  large  number  of  circuit  elements, 
however  (  as  12,000  transistors  in  the  matched  filter/detector  cell  alone),  results  in  a  rather  small 
total  probability  for  a  system  working  perfectly  for  five  years. 

It  is  vital  to  design  reliability  in  from  the  beginning  in  order  to  have  any  realistic  hope  of 
achieving  mission  requirements.  On  the  other  hand,  precise  reliability  measurements  are  obviously 
lacking  in  any  new  design,  and  more  so  than  usual  in  rad-hard  wafer-scale  technology.  As  discussed 
previously,  the  approach  taken  for  the  PFPP  is  to  utilize  a  redundant  network  of  processor  elements. 
In  order  to  evaluate  this  approach  quantitatively,  the  reliability  of  an  individual  PE  must  be 
estimated;  if  the  PE  is  too  complex,  the  survival  rate  will  be  too  small.  In  this  case,  sparing 
must  be  provided  at  a  lower  level,  or  a  large  number  of  spare  PEs  must  be  allocated.  Thus,  some 
sort  of  estimate  of  PE  reliability  must  be  found  in  spite  of  the  novelty  of  the  technology.  A  certain 
amount  of  sloppiness  in  the  estimates  must  be  tolerated,  and  the  sensitivity  of  the  overall  PFPP 
reliability  to  this  uncertainty  must  be  at  least  estimated. 

4.1  MIL-HDBK-217E 

Recognizing  the  problem,  DoD  has  issued  MIL-HDBK-217E,  Reliability  Prediction  of  Elec¬ 
tronic  Equipment  [4].  This  handbook  presents  failure  models  of  electronic  components  and  systems, 
as  well  as  constants  for  evaluating  the  models  based  on  experience  to  date. 

Section  5.1.2  of  [4]  presents  a  failure  rate  prediction  model  for  monolithic  microelectronic 
devices.  This  model  is: 

Xp  =  XQ  •  (C\KT*V  +  Cl-KE )  ■  *L 

where 


Ap 

is 

the  predicted  device  failure  rate  in  failures/10®  hours, 

XQ 

is 

the  quality  factor, 

is 

a  circuit  complexity  factor,  depending  on  transistor  count  and  technology, 

*T 

is 

the  temperature  acceleration  factor, 

Xv 

is 

the  voltage  stress  derating  factor, 

c2 

is 

a  package  complexity  factor, 

is 

the  application  environment  factor,  and 

is 

the  device  learning  factor. 

For  the  purposes  of  our  discussion,  let  us  consider  this  as 

Xp  =  kqttl  •  (Ci7Tj7rv'  -f  C 2 jr g  )  .  (4.1) 

Term  1  Term  2 
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4.2  PROCESSOR  ELEMENT  RELIABILITY  ESTIMATE 


Term  1  applies  to  failures  of  individual  PEs,  and  hence  will  be  reduced  by  PE  sparing;  Term  2 
applies  to  packaging  failures.  Since  the  pinouts  are  not  redundant  in  the  current  design,  Term  2 
will  not  be  affected  by  our  sparing  strategy  -  failure  of  a  pin  will  reduce  the  capability  of  the  wafer 
to  perform  its  mission.  Clearly  there  is  not  a  lot  of  field  experience  with  wafer-scale  SOI.  However, 
in  order  to  proceed  with  quantitative  analysis  of  a  PE,  we  must  evaluate  the  various  factors  of 
Term  1  as  best  we  can.  (Note  that  references  to  Table  5.1...-.  in  this  and  Section  4.3  are  to  tables 
in  [4],  not  this  report.) 

Cl:  As  might  be  expected,  MIL-HDBK-217  has  no  data  directly  applicable  to  wafer-scale 
devices.  Hence,  values  for  C\  must  be  extrapolated  from  data  it  presents.  Of  the  devices  that 
might  be  applicable  to  the  FPP  case,  Section  5.2.1  presents  data  for  shift  registers,  static  RAMs, 
and  microprocessors  in  CMOS.  The  approach  taken  here  is  to  simply  calculate  the  C\  for  each,  and 
then  sum  them.  (Adding  probabilities  of  failure  is  equivalent  to  multiplying  probabilities  of  success 
as  long  as  Pp  <C  1)  We  have 


Device 

Ci 

Shift  register  (<1000  gates) 

0.02 

Static  RAM  (<16  K) 

0.10 

Microprocessor  (16  bit) 

0.06 

Total 

0.18 

ttj:  This  factor  depends  on  the  technology  and  the  worst-case  junction  temperature.  For 
the  space  flight  environment,  worst-case  case  temperature  is  specified  as  45  °C.  The  rise  over  case 
temperature  is  difficult  to  estimate  with  any  precision;  the  PFPP  wafer  may  dissipate  ss  5  W  (the 
Lincoln  Laboratory  fast  Fourier  transform  wafer  dissipates  «  3  W  at  16  MHz)[6].  This  heat  will 
not  be  produced  uniformly  over  the  surface  of  the  wafer,  however.  A  wild  guess  for  the  worst-case 
junction  temperature  rise  is  Tj  =  15  °C  over  the  case  temperature,  or  60  °C.  This  choice  yields 
xr  =  0.95  from  Table  5. 1.2. 7-8. 

Try.  This  is  1.0  from  Table  5.1.2.7-14. 


4.3  FPP  RELIABILITY  ESTIMATE 

To  complete  the  analysis  of  the  wafer,  we  will  first  evaluate  Term  2  of  equation  4.1  to  gain  an 
estimate  of  wafer  reliability  without  sparing;  then  we  will  add  in  the  sparing  combinatorics. 

C3:  The  package  complexity  factor  is  given  in  Table  5.1.2.7-16  as 
C2  =  3.0  x  10 ~5N},-82 
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for  hermetic  flatpacks.  This  equation  is  only  valid  up  to  Np  =  24  pins.  However,  blithely  extrap¬ 
olating  to  40  pins  for  the  prototype  FPP,  we  obtain  C2  =  0.188.  Note  that  for  a  6-inch  wafer  of 
500  pins,  this  equation  yields  C2  =  2.5,  suggesting  that  redundant  pinouts  (or  much  improved 
packaging)  will  be  an  important  part  of  the  design  strategy  for  the  full  up  FPP. 

irE:  The  space  flight  environment  Sf  is  relatively  benign,  and  from  Table  5.1.2.7-3  irE  -  0.9. 

tt£,:  The  learning  factor  irE  is  taken  to  be  10  in  the  case  of  a  new  device  in  initial  production 
and/or  a  new  and  unproven  technology. 

itq:  The  quality  factor  kq  is  keyed  to  the  military  classification  system  established  in  MIL- 
STD-883.  It  is  given  as 


Class 

7TQ 

S 

0.25 

S-l 

0.75 

B 

1.0 

Since  the  FPP  would  clearly  not  be  listed  on  QPL-38510, 1  have  considered  it  to  be  Class  S-l,  and 
assigned  ixq  =  0.75. 

For  small  numbers  of  PEs,  there  are  roughly  8  pins  per  PE  (5  TDI  inputs,  1  output,  1  prior 
PE  reference,  and  1  control).  Rewriting  equation  4.1  in  terms  of  NpE,  the  number  of  PEs,  and 
evaluating  constants,  we  obtain  for  a  wafer  without  sparing 

XP  =  7.5  •  [0.17NP£  +  2.7  x  10~S(SNPE)1S2]  .  (4.2) 

- v - '  - - V - - 

Term  1  Term  2 

For  Np£  =  8,  Equation  4.2  evaluates  to  Term  1  =  1.36  and  Term  2  =  0.05,  confirming  our  intuition 
that  (for  relatively  simple  packaging)  PE  sparing  is  most  important.  Wafer  reliability  over  5  years  is 
evaluated  using  Equation  4.2  in  Figure  4- 1(a)  for  various  numbers  of  PEs  on  a  single  wafer  without 
sparing.  As  can  be  seen,  although  the  probability  of  survival  for  a  single  PE  P0  =  0.946,  for  the  8 
PEs  required  to  do  the  job,  predicted  reliability  P8  =  0-64,  which  is  rather  poor. 

Figure  4-l(b)  shows  the  effect  of  sparing  for  a  2-wafer  set.  System  reliability  rises  dramatically 
as  the  first  2  spares  are  introduced,  then  levels  off  and  begins  to  fall  slightly,  so  that  additional 
spares  are  not  helpful.  This  effect  is  due  to  Term  2,  the  projected  packaging  failure  rate,  since  the 
PE  sparing  term  alone  continues  to  rise  to  1.  Hence,  redundant  pinouts  will  have  to  be  introduced 
to  raise  system  reliability  above  the  ~  97  percent  level  in  this  failure  model. 


4.4  INFLUENCE  OF  PAIRWISE  INTERACTIONS 

A  further  question  which  arises  in  the  context  of  FPP  reliability  calculations  is  the  effect  of 
correlated  failures.  Section  4.3  assumes  that  the  individual  PE  failure  rates  computed  according  to 
MIL-HDBK-217E  are  statistically  independent,  and  combines  them  accordingly. 
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(a)  RELIABILITY  WITHOUT  SPARING 


P0  =  0  946 

□  =  8  ACTIVE  PEs 

O  -  PACKAGING 
A  =  SYSTEM 


2  3 

SPARES  (No.) 

(b)  M-OF-N  PE  SPARING 


Figure  4-1.  Processor  reliability. 


However,  it  is  possible  that  the  failure  of  one  PE  might  somehow  stress  its  neighbors  (e.g.,  by 
dragging  down  bus  voltage).  In  this  case,  failure  of  one  PE  would  increase  the  probability  that  its 
neighbors  would  fail,  so  that  Pfan[PE,\PEi±i]  >  Pjan[PEi\. 

The  approach  taken  here  is  to  assume  that  the  failure  probability  calculated  from  MIL-HDBK- 
217E  is  that  of  an  isolated  processor;  nearest-neighbor  interaction  is  then  added  by  multiplying  by 
a  tri-diagonal  interaction  matrix  containing  the  nearest- neighbor  interaction  Csn  (This  approach 
assumes  Pja n  -Cl): 


fail  j 


L  Pfa'h  . 


—  [Pfaili  *  *  -  Pf  oi/g  ] 


1  Cnn  0  0 

CsN  1  CtfN  0 
0  1  C/vat 


•  •  •  0  Cnn  1 


Not  all  Pfaii,  are  the  same  using  this  approach,  and  the  sparing  calculation  is  modified  to 
incorporate  this  fact. 
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Figure  4-2  presents  the  results  of  the  a  calculation  of  the  5-year  probability  of  success  for 
various  Cnn  &s  a  function  of  the  number  of  spares,  using  the  PE  failure  rate  calculated  from  MIL- 
HDBK-217E.  For  Cjvn  =  0-2,  2  spares  are  adequate  to  keep  the  overall  probability  of  success  above 
0.95. 


Figure  4-2.  Processor  reliability  with  nearest-neighbor  interaction. 

Since  the  rate  calculations  from  [4]  are  far  from  exact,  it  is  worthwhile  looking  at  what  sort  of 
margins  the  sparing  strategy  provides.  Thus,  we  conclude  this  discussion  by  examining  the  relia¬ 
bility  of  the  8-processor  PFPP  system  as  a  function  of  the  probability  of  survival  of  the  individual 
PEs.  Figure  4-3  shows  the  overall  probability  of  survival  as  a  function  of  a  single  PE’s  probability 
of  survival,  for  the  case  of  Cnn  =  0  and  CVjV  =  0.2.  For  the  no-interaction  case  with  2  spares,  the 
system  probability  of  5-year  survival  stays  above  0.9  as  long  as  the  single-PE  probability  is  also 
above  0.9.  With  Csn  —  0-2,  this  threshold  is  increased  to  ~0.925.  Increasing  the  number  of  spares 
to  4  gives  a  system  probability  of  survival  >0.9  as  long  as  an  individual  PEs  probability  of  survival 
is  >0.8  for  the  no-interaction  case. 
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Figure  4-3.  Effects  of  PE  reliability. 
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5.  CONCLUSION 


We  have  described  the  design  of  a  prototype  wafer-scale  focal  plane  processor.  This  design 
represents  an  evolution  from  previous  work  in  monolithic  VLSI,  as  it  has  four  different  cells  in 
each  PE,  rather  than  a  relatively  uniform  array  of  cells  as  in  earlier  work[7,8].  The  design  incor¬ 
porates  fault-tolerant  technology  in  order  to  achieve  a  five-year  lifetime  with  predicted  97  percent 
confidence. 

Wafer-scale  technology  represents  an  important  advance  in  focal  plane  processors  for  space- 
based  applications.  Current  processors  in  aircraft-based  applications,  with  only  medium  levels 
of  integration,  occupy  several  cubic  feet  of  volume  and  dissipate  several  thousand  watts[5].  By 
comparison,  a  processor  based  on  wafer-scale  technology  would  be  at  least  an  order  of  magnitude 
smaller  in  both  parameters. 
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APPENDIX  A 

MAXIMUM  LIKELIHOOD  APPROACH  TO  GAMMA  CIRCUMVENTION 


Since  the  early  1970’s,  there  has  been  a  great  deal  of  effort  expended  on  increasing  the  immunity 
of  infrared  sensor  systems  to  the  effects  of  7  radiation.  The  most  successful  approach  by  far  has 
been  to  harden  the  focal  plane.  Developments  in  detector  technology  have  resulted  in  an  enormous 
decrease  in  detector  volume,  which  has  reduced  the  detector  cross  section  by  several  orders  of 
magnitude.  The  consensus  is  that  the  easy  gains  have  been  made,  so  that  absent  any  breakthroughs 
such  as  intrinsic  event  discrimination,  the  next  order-of-magnitude  increase  will  be  a  lot  harder  than 
the  previous  four. 

At  the  same  time,  electronic  7  circumvention  has  made  great  strides.  Given  the  difficulty  of 
pushing  the  state  of  the  art  in  detectors  and  the  computational  resources  expended  on  circumven¬ 
tion,  it  is  worthwhile  to  reexamine  the  field. 

The  most  successful  approach  to  7  circumvention  to  date  is  a  two-stage  algorithm,  the  spike 
adaptive  time  delay  and  integration  (SATDI)  method  pioneered  by  Boeing.  This  heuristic  approach 
makes  no  assumptions  about  7-induced  noise  except  that  it  is  an  additive  corruption  of  the  true 
signal  (assuming  unipolar  7  spikes,  as  we  will  do  throughout).  The  first  stage  of  the  method  utilizes 
all  the  TDI  signals  corresponding  to  a  given  point  in  space  to  set  an  upper  bound  on  a  reasonable 
signal.  Signals  above  this  bound  are  assumed  to  be  corrupted  by  7  spikes,  and  are  discarded.  In 
the  second  stage,  detection  proceeds  normally  on  the  cleaned-up  sample. 

A.l  MAXIMUM  LIKELIHOOD  MODEL 

If,  however,  one  is  willing  to  make  assumptions  about  the  form  of  the  7-induced  noise  distri¬ 
bution,  it  is  possible  to  design  an  optimal  detector  for  a  given  signal  in  the  presence  of  this  noise 
utilizing  classical  maximum  likelihood  detection  theory.  The  rest  of  this  appendix  reports  on  such 
a  detector,  utilizing  the  formalism  developed  in  H.  L.  Van  Trees[14],  Chapter  2. 

The  parametric  form  chosen  is  an  exponential  distribution,  \e~Xr  where  r  is  the  received 
signal,  which  is  a  reasonable  approximation  to  the  observed  7  spectrum  in  IBC  detectors. [13]  The 
noise  model  is  thus  the  sum  of  (a)  Gaussian  background  noise  with  variance  a2  and  mean  fx,  (b) 
exponential  7  noise  with  mean  1/A  and  a  probability  of  occurrence  in  a  given  sample  /.  That  is, 


if  nj  is  the  total  noise, 

nT  =  ns  +  (A.l) 

PnB(nB)  =  N(fx,°)  (A.2) 

pn-,(rc-y)  =  f\e~Xr  +  (l-f)6(0)  (ignore  multiple  hits)  .  (A. 3) 

Adding  random  variables  is  equivalent  to  convolving  their  pdf’s,  so  (adopting  the  notation  of  Van 

Trees)  the  probability  density  under  the  null  hypothesis  is 

Pr,\H0  =  /A<5(— a/< r)exp[^(a2A  +  2 fx  -  2r)]  +  /2°*  (A.4) 
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(A.5) 


where 


a  =  r  —  fj.  —  cr2X  (A.6) 

Q(x)  =  ~=  j~e-t2/2dt  .  (A.7) 

We  let  hypothesis  H\  correspond  to  the  presence  of  a  constant  voltage  m.  This  voltage  simply 
shifts  the  scale,  so  that 


Prt\H\  =  Hri  ~  m;p,cr,A,/) 


(A.8) 


Thus  the  likelihood  ratio  A  is 


n^(r,  ~  m) 

n 


(A.9) 


or,  equivalently, 

log(A)  =  -  m))- ^log(^(ri))  .  (A.10) 

Note  that  the  parameters  /  and  A  may  be  estimated  independently  of  a,  n  and  m  by  monitoring 
a  small  sample  of  the  detector  array  which  is  kept  outside  the  field  of  view  of  the  telescope. 


A. 2  ILLUSTRATIVE  EXAMPLE  AND  COMPARISON  WITH  SATDI 

In  this  section,  we  will  consider  a  particular  case  corresponding  to  detection  against  a  bright, 
i.e.,  nuclear-induced  background  using  IBC  detectors.  The  parameters  are 


Parameter 

Value 

TDI 

8  stages 

P 

400000  electrons 

a 

632  (=  v£) 

A 

1/4000 

/ 

0.3 

The  probability  density  function  corresponding  to  equation  A.4  is  shown  in  Figure  A- 1.  It  is 
characterized  by  a  Gaussian  part,  which  falls  off  rapidly,  and  a  long  exponential  tail  resulting  from 
the  7-induced  corruption.  In  Figure  A-2  the  log  of  the  likelihood  ratio  (Equation  A. 10)  is  plotted 
for  3  cases:  no  7  contamination,  15  percent  7s,  and  30  percent  7s.  The  signal  strength  m  is  taken 
to  be  1265  electrons,  corresponding  to  twice  the  standard  deviation  of  the  background  noise.  The 
output  signal-to-noise  ratio  is  thus  2>/8  or  roughly  5.7. 
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Figure  A-l.  Probability  density  function. 


m  =  2o  (1265) 
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Figure  A-2.  Log  likelihood  ratio. 
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For  the  7-free  case,  the  log  likelihood  ratio  is  just  a  straight  line,  reflecting  the  classical  result  for 
a  signal  in  the  presence  of  Gaussian  noise.  Addition  of  7  contamination  results  in  the  ratio  peaking, 
then  falling  back  to  an  asymptote.  Physically,  this  behavior  can  be  thought  of  as  reflecting  the  fact 
that  high  received  signals  are  most  likely  due  to  7  contamination.  The  signals  of  interest  to  the 
detector  would  thus  be  those  lying  in  a  roughly  parabolic  region  whose  bottom  is  defined  by  the 
number  of  acceptable  false  alarms. 

In  contrast,  the  effect  of  SATDI  is  approximately  equivalent  to  taking  the  /  =  0  curve  and 
cutting  it  off  sharply  at  some  value  of  r,  creating  a  triangular  region  of  interest. 

The  log  likelihood  ratio  is  a  nonlinear  equation  and  is  difficult  to  analyze  analytically,  while 
SATDI  is,  by  nature,  heuristic.  Therefore,  we  compared  them  using  a  Monte  Carlo  simulation. 
Events  were  generated  using  our  model  parameters.  For  the  SATDI  case,  the  algorithm  used  was 
that  designed  for  the  wafer-scale  prototype  focal  plane  processor  chip.  In  this  algorithm,  a  is 
estimated  as  the  square  root  of  the  average  TDI  signal,  and  the  7  threshold  was  set  at  1.2  cr.  The 
remaining  signals  were  then  averaged  and  compared  to  a  detection  threshold. 

For  the  maximum  likelihood  case,  the  log  likelihood  of  each  sample  was  calculated,  and  the 
sum  compared  to  a  detection  threshold.  All  other  parameters,  however,  were  known  and  fixed,  so 
that  this  simulation  does  not  measure  the  sensitivity  of  the  method  to  misestimation  of  /,  A,  and 

H. 

The  two  methods  are  compared  in  Figure  A-3  by  plotting  the  receiver  operating  characteristic 
(ROC).  In  an  ROC  plot,  the  probability  of  detection  is  plotted  versus  the  probability  of  false 
alarm.  The  curves  are  approximate  fits  to  the  data  points,  which  have  significant  scatter  due  to 
the  limited  number  of  Monte  Carlo  throws  per  point  (20,000).  A  typical  error  bar  is  shown  for 
reference.  The  maximum  likelihood  curve  lies  significantly  above  the  SATDI  one,  indicating  that 
either  much  better  detection  for  a  fixed  false  alarm  rate,  or  many  fewer  false  alarms  for  a  fixed 
detection  probability  may  be  achieved. 

In  conclusion,  maximum  likelihood  detection  promises  to  give  significantly  better  performance 
than  SATDI  within  the  framework  of  our  model  of  7  noise.  Further  work  must  be  performed  to 
test  the  validity  of  the  model,  as  well  as  to  explore  its  performance  in  more  realistic  cases  in  which 
the  signal,  m,  is  not  known  a  priori.  In  these  cases,  m  may  be  estimated  as  is  currently  done  with 
maximum  likelihood  detection  grafted  on  at  the  end,  or  a  maximum  likelihood  estimation  technique 
may  be  feasible  as  well,  resulting  in  a  unified  detection  and  estimation  scheme. 
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Figure  A-3.  Receiver  operating  characteristic. 
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APPENDIX  B 

ROUNDOFF  IN  TDI  SUMMATION 


This  Appendix  describes  a  simulation  of  errors  introduced  by  the  proposed  TDI  summation 
circuit  in  the  prototype  FPP  wafer. 

The  circuit  should  ideally  add  the  N  active  TDI  inputs  (N  <  5)  that  emerge  from  gamma 
circumvention,  and  then  divide  the  result  by  N. 

Due  to  implementation  constraints,  however,  the  proposed  hardware  implemention  will  look 
like  this: 


OUTPUT  TO 
MATCHED 
FILTER  AND 
DETECTOR 


NUMBERS  ABOVE  LINES  =  MAXIMUM  NONZERO  BITS 
Figure  B-l.  Detail  of  proposed  TDI  summation  circuit. 

As  a  consequence,  the  final  result  may  be  off  by  one  bit  from  the  exact  method  of  letting  the 
intermediate  sums  cascade  up  to  15  bits. 

As  might  be  expected,  this  effect  will  be  most  severe  from  a  percentage  point  of  view  when 
the  background  level  is  very  low.  It  is  also  most  noticeable  when  the  7  level  is  low  as  well,  since 
when  one  sample  has  been  rejected  due  to  7  contamination,  the  above  method  is  exact. 

The  proposed  circuit  has  been  modeled  and  run  through  a  Monte  Carlo  program  which  models 
the  signal  and  background  photons  as  a  Poisson  process,  and  incorporates  7  contamination  as  a 
random  exponential  process  according  to  the  model  of  Section  2.5.  The  circuit  was  modeled  for 
three  cases: 
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(1)  Low  background,  high  7:  The  mean  noise  level  was  35,  the  signal  was  70,  and 
the  7  level  was  40  percent. 

(2)  Low  background,  low  7:  As  item  (1),  but  the  7  level  was  0  percent. 

(3)  High  background,  high  7:  The  mean  noise  level  was  2000,  the  signal  was  70, 
and  the  7  level  was  40  percent. 


The  results  are  presented  in  Figures  B-2  (a)  and  (b),  and  B-3  respectively.  The  Monte  Carlos  were 
generated  assuming  a  10:1  scaling  at  the  analog  to  digital  converters.  (See  footnote  in  Section  2.5.1.) 
This  scale  factor  is  used,  for  example,  in  ground-based  visible  data  gathered  by  Lincoln  Laboratory 
Group  94[11]  at  the  Lincoln  Experimental  Test  System  in  New  Mexico. 


ROUNDOFF  ERROR 
SOOO  THROWS;  DIM  BACKGROUND 


PERCENT  ERROR 


(«) 


(b) 


Figure  B-2.  Low  background,  (a)  high  gamma  and  (b)  low  gamma  simulation. 


As  expected,  the  most  striking  error  is  in  the  low  background,  radiation-free  case,  where  14 
percent  of  the  samples  will  have  errors  of  ±  1  percent  (Figure  B-2(b)).  The  forward  bias  is  due 
to  the  Monte  Carlo  program’s  practice  of  always  rounding  |  up.  A  randomized  rounding  rule  will 
equalize  it. 
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ROUNDOFF  ERROR 
5000  THROWS; 
BRIGHT  BACKGROUND 


Figure  B-3.  High  background,  high  gamma  simulation. 


This  behavior  is  not  a  serious  problem  in  a  prototype  processor,  but  it  would  have  to  be 
addressed  in  a  later,  full-up  system  in  which  (a)  low  background  observations  might  be  an  important 
part  of  the  mission,  or  (b)  the  number  of  TDI  stages  is  increased. 
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